DATA

Import

Import Modules

Import Data

Data Description

Data Sources

The Billboard 100

https://en.wikipedia.org/wiki/Billboard_Hot_100

https://www.kaggle.com/datasets/dhruvildave/billboard-the-hot-100-songs

1.2M Songs with Metadata (csv)

https://www.kaggle.com/datasets/rodolfofigueroa/spotify-12m-songs

8+ M. Spotify Tracks, Genre, Audio Features (SQL)

https://www.kaggle.com/datasets/maltegrosse/8-m-spotify-tracks-genre-audio-features

Spotify API

https://developer.spotify.com/documentation/web-api/

https://developer.spotify.com/console/get-search-item

https://developer.spotify.com/console/get-audio-features-track/

https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-features

Spotipy Library: https://spotipy.readthedocs.io/en/master/

Data Description and Discussion

Spotify API Audio Feature Descriptions

from: https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features

acousticness

number \

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

>= 0
<= 1

analysis_url

string

A URL to access the full audio analysis of this track. An access token is required to access this data.

danceability

number \

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

duration_ms

integer

The duration of the track in milliseconds.

energy

number \

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

id

string

The Spotify ID for the track.

instrumentalness

number \

Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

key

integer The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

>= -1
<= 11

liveness

number \

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

loudness

number \

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.

mode

integer

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

speechiness

number \

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

tempo

number \

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

time_signature

integer

An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".

>= 3
<= 7

track_href

string

A link to the Web API endpoint providing full details of the track.

type

string

The object type.

Allowed value: "audio_features"

uri

string

The Spotify URI for the track.

valence

number \

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

>= 0
<= 1

Descriptive Statistics and Data Features

Proportion of Songs With Audio Feature Data:

~75% of songs on the Billboard list are available on Spotify, and weren't removed for data errors

EXPLORATORY DATA ANALYSIS

Histograms

Time Series

NOTE:

Comparative Time Series

Time Series Counts - Preferred Date Filter

CONCLUSION: We should only include data up to, but not including 2021

Billboard Charts Historical Plots

Correlation Analysis

Genres

Import Genre Data

Too Many Genres!!!

Regex Filtering to Create Subgenre Groups

Boxplots

Conclusion: It looks like adding more genrealised genre data increases variability
Test Case 2: Pop Music

Boxplots for Large Genre Groups